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Abstract. In many applications of Monte Carlo nonlinear filtering, the propagation step is com- 
putationally expensive, and hence, the sample size is limited. With small sample sizes, the update 
step becomes crucial. Particle filtering suffers from the well-known problem of sample degener- 
acy. Ensemble Kalman filtering avoids this, at the expense of treating non-Gaussian features of 
the forecast distribution incorrectly. Here we introduce a procedure which makes a continuous 
transition indexed by 7 G [0, 1] between the ensemble and the particle filter update. We propose 
automatic choices of the parameter 7 such that the update stays as close as possible to the particle 
filter update subject to avoiding degeneracy. In various examples, we show that this procedure 
leads to updates which are able to handle non-Gaussian features of the prediction sample even in 
high-dimensional situations. 



1. Introduction 

State space models consist of a (discrete or continuous time) Markov process which is partially 
observed at discrete time points and subject to independent random errors. Estimation of the state 
at time t given observations up to the same time is called filtering or data assimilation. Since ex- 
act computations are possible essentially only in linear Gaussian situations, mostly Monte Carlo 
methods are used for filtering. In many environmental applications, in particular in atmospheric 
physics, oceanography and reservoir modelling, the dimension of the state is, however, very large 
and the computational costs to propagate the state forward in time are huge, which severely limits 
the potential sample size for Monte Carlo fil tering methods. Particle filters ([Gordon et all 11993 ; 



Pitt and 



eracy 



1998 



Shephardl. 



Snvder et al 



1999; 



Doucet et al 



2000) suffer from the well-kn own problem o f sample degen 



2008) . In contras t, the ensemble Kalman filter (jEvensenl . Il994t iBurgers et al 



Houtekamer and Mitchell 



19981 ) can handle some problems where the dimensions of states 
and observations are large, and the number of replicates is small, but at the expense of incorrectly 
treating non-Gaussian features of the forecast distribution that arise in nonlinear systems. 



proximate the forecast distribution as a mixture of Gaussians (Bengtsson et al.. 


2003; 


Sun et al. 


2009 


Dovera and Delia Rossa 


2011 


Frei and Kiinsch 


2011: Hoteit et al. 


2012 


Rezaie and Eidsvik 


2012 


, and sequential importance samplers that use the ensemble Kalman filter as a proposal distri 



bution rtMandel and Beezlev 



2009; 



Papadakis et all 120101 ) . In this article, we introduce an update 



scheme that blends these two flavours: A Gaussian mixture proposal obtained from an ensemble 
Kalman filter update based on a tempered likelihood is corrected by a particle filter update. In this 
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way we do not have to fit a Gaussian mixture to the forecast sample nor do we have to approximate 
the ratio of the predictive density to the proposal density. A further advantage of our procedure is 
that we can implement these two steps in such a way that the particle weights do not depend on 
artificial observation noise variables and the resampling avoids ties. 

Our procedure depends on a single tuning parameter 7 £ [0, 1], which allows continuous inter- 
polation between the ensemble Kalman filter (7 = 1) and the particle filter (7 = 0). Hence, the 
parameter 7 controls the bias- variance trade-off between a correct update and maintaining the di- 
versity of the sample. It can be chos en without prior knowledge based on a suitable measure of 
diversity like effective sample size ESS (jLiul . [1996) , or the expected number of Gaussian components 
which are represented in the resample. 

The rest of the article is organized as follows. In Section[21 we detail the problem setting, introduce 
some notation, and provide background material. In Section [3l we present our new method and 
discuss implementational aspects. In Section 21 we discuss the choice of the tuning parameter 7. In 
Section [5l we consider numerical examples that involve single updates only; based on different prior 
specifications, we examine the differences of our method in comparison to the ensemble Kalman 
filter and the particle filter. In Section [SJ we consider examples that involve many update cycles in 
two common test beds. Section [7] contains an outlook to possible generalizations. 



2. Problem setting, notation and background material 

We consider a dynamical system with state variable (xt € K 9 : t = 0, 1, . . .) and observations 
(yt € M r : t = 1, 2, . . .). The state follows a deterministic or stochastic Markovian evolution, that is 
Xt — g{xt-\, £t) where the system noise £t is independent of all past values x s and all £ s , s < t. There 
is no need to know the function g in explicit form, we only assume that for given Xt—i we are able to 
simulate from the distribution of g(xt—x, £t). In particular, the evolution can be in continuous time, 
given by an ordinary or stochastic differential equation. 

In all cases we assume linear observations with Gaussian noise: yt = Hx t + et, tt ~ Af(0,R). 
This means that the likelihood for the state x* given the observation y t is l(xt\yt) = <p{yt \ Hxt, R). 
Here and in the following ip(x; /i, E) denotes the (in general multivariate) normal density with mean 
Li and covariance E at x. In the final section, we will discuss briefly how to adjust the method for 
non-Gaussian likelihoods. 

We denote all observations up to time t, (yi, • ■ ■ ,yt) by y\.t- The forecast distribution 7rf at 
time t is the conditional distribution of x t given yi :t _i, and the filter distribution 7r" at time t 
is the conditional distribution of Xt given yi.t- In principle, these distributions can be computed 
recursively, alternating between propagation and update steps. The propagation step leads from 
7r"_i to 7rf : 7rf is the distribution of g(xt-i, £t) where Xt-i ~ T^t-i an d £t is independent of Xt-\ and 
has the distribution given by the evolution of the system. The update step leads from 7r^ to 7r" and is 
nothing else than Bayes formula: n^(dxt) oc i{xt\yt)^{dxt). However, analytical computations are 
possible (essentially) only if the system evolution is also linear with additive Gaussian noise. Hence 
one resorts to Monte Carlo approximations, i.e., one represents 7rf and 7r" by ensembles (samples) 
(xt j) and (Xt j) respectively. The members of these ensembles are called particles. 
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The propagation step just lets the particles evolve according to the dynamics of the state, that 
is we simulate according to the time evolution starting at X™-\j at time t — 1, x P j = g(xf_ l ■, £tj). 
However, the computational complexity of this step limits the number of particles, that is the size 
of the sample. 



The (bootstrap) particle filter ([Gordon et all 119931) updates the forecast particles by weighting 
with weights proportional to the likelihood £(x t \yt) and converts this into an unweighted sample by 
resampling, i.e., (xfj) is obtained by sampling from 



(i) 



JV 

E 

3=1 



Et^K k \yt) 



Thus some of the forecast particles disappear and others are replicated. If the likelihood is quite 
peaked, which is the case in high dimensions with many independent observations, the weights will 
be heavily unbalanced, and the filter s ample eventually degenerates since it concentrates on a single 
or a few particles; see ISnvder et al.l (|2008l) . Auxiliary particle filters (jPitt and Shephardl . Il999l ) 
can attenuate this behaviour to some extent, but they require good proposal distributions for the 
propagation, and an analytical expression for the transition densities. 



The ensemble Kalman filter ( Burgers et al 



1998: 



Houtekamer and Mitchell 



19981 ) makes an affine 



correction of the forecast particles based on the new observation y t and artificial observation noise 
variables e tJ ~ A/"(0, R): 



XT, = x?, + K(P[)(y t - Hx p + e tJ ) 



"t,3 



where Pf is an estimate of the forecast covariance at time t, typically a regularized version of the 
sample covariance of (x£ •), and K (P) is the Kalman gain K(P) = PH'{HPH' + J?) -1 . This 
update form ula is (asymptotically) correct under the assumption that the forecast distribution w P 



is Gaussian (Le Gland et al 



20101 ) . Although this is u sually no t valid , the update nevertheless has 



Evensen 



([20071 ) and references therein. For 



been found to work well in a variety of situations; see 
later use, we note that conditional on the forecast sample, 

xl 3 ~ N (xf j + K(Pf){y t - Hxy, K(P?)RK(P?y). 

Therefore the filter sample can be considered as a balanced sample from the (conditional) Gaussian 
mixture 

JV 

(2) 



1 N 

jf E N K 3 + K ( p t p )(yt - H^K{P!)RK{Pf)'). 



3=1 



Here, "balanced sample" simply means that we draw exactly one realization from each of the iV 
equally weighted Gaussian components. 



3. A BRIDGE BETWEEN ENSEMBLE AND PARTICLE UPDATES: THE ENSEMBLE KALMAN PARTICLE 

FILTER 

3.1. The new method. We consider here the update at a single fixed time t a nd thus suppress t 
in the notation. We follow the "progressive correction" idea (|Musso et all 120011 ) and write 



w u (dx) cx tt u ^ (dx)£(x\y) 



1-7 



r «,7 



(dx) cx Tr p (dx)e(x\ y y 



4 



M. FREI AND H. R. KUNSCH 



where < 7 < 1 is arbitrary. Our approach is to use an ensemble Kalman filter update to go from 
7r p to tt u, ' y , and a particle filter update to go from 7r"' 7 to ir u . The rationale behind this two-stage 
procedure is to achieve a compromise between sample diversity and systematic error due to non- 
Gaussian features of tt p . The former is large if 7 is close to one because the ensemble Kalman filter 
update draws the particles closer to the observation y, and the exponent 1 — 7 dampens the ratio of 
any two resampling probabilities. The latter is small if 7 is small. 
Since £(x\y)' y oc ip(y; Hx, R/"f), and 

(3) PH'(HPH' + R/^y 1 = jPH'^HPH' + i?)" 1 = K(jP) 

the ensemble filter update is straightforward: We only need to compute the gain with the reduced 
covariance jPP. The particle update then resamples with weights proportional to tp(y; Hx,"fR). 
However, there are two immediate drawbacks to such an algorithm: the particle weights depend on 
the artificial observation noise variables which are needed for the ensemble Kalman filter update, 
and the resampling introduces tied values. We show next how to address both points. By {2j, we 
can write 

1 N 

(4) ^ « ^EnKF = £A^;<\Q( 7 ,P P )) 

3=1 



where 



(5) tf« = a* + K{>rF«'){y-H ! g), 



(6) Q( 7 , P p ) = -K(jP p )RK(jP p )'. 

7 

Instead of sampling from Q and applying a particle correction, we delay the ensemble Kalman filter 
sampling step, and update Q analytically. This is easy because the update of a Gaussian mixture 
by a Gaussian likelihood is a gain a Gaussian mixture whose parameters can be computed easily 



(jAlspach and Sorensonl 119721 ). We obtain 



N 

71 



3 = 1 



(8) a)« ex V (y; Hvf\ HQ( 7 , F»)H' + — R), 



where EnKPF stands for ensemble Kalman particle filter and 

1 

1-7 

(9) ^ = + K((l 7 )Q(7, P p )){y ~ Hv^\ 

(10) P^ = (I- K((l - 7 )Q(7, P p ))H)Q^, P p )- 

The update consists now in sampling from ([7|). The mixture proportions oy 7 do not depend on the 



u.-y 

y 

the filter sample because the covariance P un is not zero if 7 > 



artificial observation noise variables, and even if one a"' 7 dominates, there is still some diversity in 



BRIDGING THE ENSEMBLE KALMAN AND PARTICLE FILTER 



5 



Sampling from the j-th component of (J7]) can be done as follows: Let ei and €2 be two independent 
jV(0, R) random variables. Then 

(11) x u < 7 =x p j + K( 7 P p )(y + - Hx p ) = v u ^ + K^PP)^- 

clearly has distribution A/"(^"' 7 , Q(j, P p )), and thus by standard arguments 

x u = x^ + K((l - 7 )Q( 7 , P*)) (y + -fi= - Hx 1 ^ 

\ \/l-7 

is a sample from A/"(/i"' 7 , P un ). Hence there is no need to compute a square root of P un 

p 



To summarize, given a forecast ensemble (x P ) and an observation y, the ensemble Kalman particle 



filter consists of the following steps: 



Algorithm 1 Ensemble Kalman particle filter 



1. Compute the estimated forecast covariance P p . 

2. Choose 7 and compute K(-fP p ) according to (j3|) and v^' 1 according to (0. 

3. Compute Q{^f,P p ) according to (J6j) and the weights aj' 7 according to ([8]). 

4. Choose indices I (J) by sampling from the weights ct"' 7 with some 

balanced sampling scheme; e.g., equation (12) in iKunsch (2005). 



5. Generate e ltj ~ W(0, R) and set xy 1 = + K{jPP)^. 

6. Compute K((l — 7)^(7, P p )), generate t2.j ~ A/"(0, R) and set 

x* = xj^ 7 + if ((1 - 7 )Q(7, Pp)) (v+$±j- Hxf' 



Because matrix inversion is continuous, it is easy to check that as 7 — > 0, vj n — > x p , Q("f, P p ) — > 0, 
aj' 7 — > ip(y;Hx P ,R), fi^ n — > x P and P un — > 0. Hence in the limit 7 — > 0, we obtain the particle 
filter update. Similarly, in the limit 7 — > 1 we obtain the ensemble Kalman filter update because for 
7 — > 1 (iJ Q(7, P P )H' + R/(1 — 7)) _1 converges to zero and thus a"' 7 — > 1/N. The ensemble Kalman 
particle filter therefore provides a continuous interpolation between the particle and the ensemble 
Kalman filter. 



3.2. Modifications in high dimensions. If the dimension q of the state space is larger than 
the number TV of particles, then the variability of the usual sample covariance is huge, and one 
should use a regularized version as estimate P p . In the context of the ensemble Kalman filter, the 
standard regularization technique is the use of a tapered estimate, that is we multiply the sample 
covariance matrix by a correlation matrix which is zero as soon as the distance between t he tw o 
components of the state is l arger than some threshold, see e.g., iHoutekamer and Mitchelll (|200lh : 



Furrer and Bengtssonl (|2007| ). If also the error covariance matrix R and the observation matrix H 
are sparse, then this tapering has the additional benefit that the computation of j/ 1 ' 7 in step 2 and 
of x™' 1 in step 5 is much faster because we do not need to compute K(jP p ) for this. It is sufficient to 
solve 2N equations of the form (■yHPH' + R)x = b which is fast for sparse matrices. However, this 
advantage is lost because for Q(7, P p ) we need K(-fP p ), which is in general a full matrix. We could 
multiply the gain matrix by another taper in order to facilitate the computation of Q("f, P p ) and to 
turn Q(7, P p ) into a sparse matrix. This would then make steps 4 and 6 in the algorithm above faster 
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because again all we need to do is to solve 2N equations of the form ((1 — 7)i?Q(7, P P )H' + R)x = b. 
Alternatively, one can could also replace the Rai n by the optimal matrix w ith a given sparsity pattern, 



i.e., using a localized update in grid space, see ISakov and Bertind ([20101 ) 



Because K(jP p ) is only used to compute Q(7, P p ), a simpler alternative which avoids computing- 
gain matrices is to generate the values K{^P p )e\j/ ' ^py needed in step 5 before step 3 and 4, and 
then to replace Q(j, P p ) by a sparse regularized version of the sample covariance of these values. If 
this approach is taken, it is usually feasible to generate more than N such values in order to reduce 
the Monte Carlo error in the regularized sample covariance matrix. 

3.3. Consistency of the ensemble Kalman particle filter in the Gaussian case. We establish 
consistency of the ensemble Kalman particle filter for any 7 as the ensemble size N tends to infinity, 
provided that the forecast sample is iid normal. We assume that all random quantities are defined 
on some given probability space (il, J 7 , P) and "almost surely" is short for "P-almost surely" . The 
observation y is considered to be fixed (nonrandom). The superscript - N is added to any random 
quantity that depends on the ensemble size N. We use the following notion of convergence: A 
sequence (tt n )neN of random probability measures converges almost surely weakly to the probability 
measure n if J h(x)ir N (dx) converges almost surely to J h(x)n(dx) for any continuous and bounded 
function h. 

Theorem 1. Suppose that (x P )j^fi is an iid sample from ir p = A/"(/i p , P p ). Then, for any 7 £ [0, 1], 



x) oc 



the sequence 7Tg' KPF &s defined in (J7J converges almost surely weakly to the true posterior ir 11 (d. 
(p(y;Hx,R)w p (dx). Additionally, if (x^' )j=i,...,jv is a conditionally iid sample from 7i"EnKPF' then 
also jf^2j = i A^u.iv converges almost surely weakly to ir u . 

A proof is given in the appendix. Notice that if balanced sampling is used to sample from 
^■w^ yve, the partic les are no longer conditionally iid, and the arguments become more complicated; 



sec 



Kunschl (2005) for a discussion in the context of auxiliary particle filters. Inspection of the 
proof of Theorem [T] shows that if ir p is non-Gaussian with finite second moments, then tt FdKPF still 
converges almost surely weakly to a nonrandom limit distribution Tr^nKPF- The li m it distribution 
depends on 7 and generally differs from the correct posterior ir li for 7 > 0. The limit cannot be 
easily identified, and in partic ular it is difficu lt to quantify the systematic error as a function of 7. 



Using similar arguments as in I Randies! ([1982), it is also possible to show that 



(J h(x)^f KpF (dx)- J h(x)^ PF (dx)\ ^N-(0,V) 

weakly, where the asymptotic covariance V depends on h, ir p and 7. In general, V is analytically 
intractable, and we cannot verify if V decreases as a function of 7, as we expect. 

4. Choice of 7 

4.1. Asymptotics of weights. Recall that for 7 = the method is exactly the particle filter, and 
for 7 = 1 it is exactly the ensemble Kalman filter. Hence it is clear that there is a range of values 7 
where we obtain an interesting compromise between the two methods in the sense that the weights 
(aj' 7 ) are neither uniform nor degenerate. We try to provide some theoretical insight where this 
range of values 7 is, and later we develop a criterion which chooses a good value 7 automatically. 
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We want to see how the weights a;"' 7 in ([7]) behave as a function of 7 when the dimension of the 
observations is large. By definition 



aj< 7 ex exp(-l(y - H^)'(HQ(j, P P )H' + -^—R)- 1 ^ - P*/ 1 ' 7 )) 



where is the prediction mean, 



C 7 = (1 - 7 )P'(J - iv;H')((l - l)HQ 7 H' + R)-\I - HKJH, 

d y = (1 - i)H'{I - K'ryH')((l - -y)HQ 7 H' + R)-\l - HKy)(y - H[i p ) 

and K 1 and Q 7 stand for K{^P P ) and P*). 

The following lemma gives an approximate formula for the variance of the a"' 7 - 



Lemma 1. Define approximate weights by 




1 exp(-l(a? - ^/C 7 (^ - /iP) + ri 7 (^ - yP)) 
N E [exp(-I(4 - pyC^ - ^p) + dL t ( x P - p))] ' 



where C 7 and d 7 are as defined above, but with the true forecast covariance P p instead of P p . If the 
forecast sample is iid A/"(/i p , P p ), then 



A proof is given in the appendix. The matrix M is positive definite and also the trace in formula 
(|12p is positive. Therefore we expect that the variance of the weights is of the order 0(iV~ 2 (l — 7) 2 g): 
This is true if P p , R and H are all multiples of the identity, and there is no reason for a different 
behavior in other cases. This suggests that for high-dimensional observations, we need to choose 7 
close to one in order to avoid degeneracy. Note however that the final update can still differ from 
the ensemble Kalman filter update, even if the largest part of the update occurs with the ensemble 
Kalman filter. 

4.2. Criteria for the selection of 7. Because the Kalman filter update uses only the first two 
moments of the forecast distribution, it seems plausible that for non-Gaussian forecast distributions, 
the Kalman update will be less informative than the correct update. Hence, as long as the spread 
is a meaningful measure of uncertainty, we expect the Kalman update to have a larger spread than 
the correct update. This is not always true though: The variance of the correct posterior is only 
on average smaller than the variance (7 — K(P P )H)P P of the ensemble Kalman filter update. In 
particular, this may fail to hold for some values of y if the prior is multimodal. 

Still, this heuristic suggests that in some cases the spread of the update ensemble will increase 
monotonically in 7 and we could choose 7 such that the spread of the update is not smaller than a 



iV 2 Var [a] 



det(P p C 7 + 1) 



ex P K((C 7 + (P^)- 1 /2)- 1 - (C, + (PT 1 )- 1 )^) - 1. 



det(2PPC* 7 +P//2 



As 7 7 1; we have 



(12) N 2 V&r [eT^] ~ (1 - 7) 2 ( - tr(HP p H'M) + (y - HpP)'M{y - Hfx p )), 

where M = (I — K[H')R~ 1 (I — HKi)HP p H'(I - K^H^Br 1 (I - HK\). 
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factor r times the spread of an ensemble Kalman filter update (where r is maybe between 0.5 and 
0.8). This means that we compare the variances of the Gaussian mixture 2j=i Tr-N't/i™' 1 , P un ), 
where Nj denotes the number of times component j in ([7]) has been selected, with the variances 
of the Kalman filter update, that is the diagonal elements of (/ — K(P P )H)P P . However, this is 
computationally demanding, because we have to compute among other things P 11 '^; compare the 
discussion in Section [3.21 above. 

A simpler procedure is based on the standard deviations of the updated ensembles. Denote by 
af the standard deviation of the i-th component of the final update sample x™ (which depends on 
7) and by af' En the standard deviation of the update sample using the ensemble Kalman filter. 
Then we could choose the smallest 7 such that J2i > T Ei a£' En or (in order to control not only 
the total spread but all marginal spreads) such that J2i mm flj ^"(ct"' 5 ") -1 ^ > tN. This has the 
advantage that it is easy to compute also in high dimensions. The disadvantage is that it depends 
also on the generated noises. This can be reduced somehow by taking the same noises e± j and e2j 
for all values of 7 under consideration. 

A third possibility is to look only at the properties of the weigh ts a" ' 7 . W e can take as the measure 
of the sampling diversity the so-called effective sample size ESS (jLiul . Il99a) , which is defined as 

1 N 



ESS 



or the quantity 



div = Na]") = N(l - i £ |aj< 7 - 1/JV|), 

which is the expected number of components that are chosen when generating (x") according to 
([7]) with balanced sampling. Both measures are related to a distance between the a"' 7 and uniform 
weights. Although there is no universal relation between the two criteria, in typical cases ESS < div, 
i.e., ESS is a more conservative measure of diversity. Both criteria do not take the spread in P un 
into account, which also increases if 7 increases. Therefore, they give only a lower bound for the 
diversity, but they are easy to compute. We then choose 7 as the smallest value for which ESS > tN 
or DIV > tN. In order to avoid excessive computations, we considered in the examples below only 
multiples of 1/15 as values for 7 and used a binary search tree (with at most 4 search steps) , assuming 
that the diversity is increasing in 7. We did not try to prove this assumption since the calculation 
is expected to be extremely tedious; the assumption is safe to make, though, since at the worst we 
end up with a too large 7. Alternatively one could use an approximation of ESS based on (fT2")l . 



5. Examples of single updates 

5.1. Description of the setup. We consider a single update for 4 situations. In all cases H = I 
and R = a 2 1. There are 2 forecast samples (x^) combined with 2 values y = y\ and y = yi 
for the observations. The construction of the forecast sample starts with a sample Zj ~ Af q (0,I), 
which is then modified to introduce non-Gaussian features. More precisely, we consider the following 
situations: 
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0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 

y y 

Figure 1. Diversity iV -1 x ESS as a function of 7. Top row: Gaussian prior, bottom row: 
bimodal prior. Left column: j/i, right column 2/2 • Dimension q of the state variable (from 
top line to bottom line): q = 10, 50, 250. The dotted lines show the approximate diversity 
computed from (|12|l with fi p and P p estimated from the sample. 

1. A Gaussian prior: We set a = 0.5, x P — Zj, y\ = (0, . . . , 0)', and 2/2 = (1-5, 1.5, 0, . . . , 0)'. 
This means that the second observation contradicts the prior, although not excessively. 

2. A bimodal prior: We set a = 3, x p = z 3 for j < N/2 and x P = z 3 + (6,0,..., 0)' for j > N/2, 
Hi = (—2, 0, . . . , 0)' and 2/2 = (3, 0, . . . , 0). In the former case, the true posterior is unimodal 
and in the latter case it is bimodal. 

We take TV = 50 and q = 10, 50, 250. We generate one sample in dimension 250 and use the first 
q components. In all cases, we use a triangular taper with range 10, assuming that the states are 
values along a line (the optimal taper has range 0, since the true covariance P p is diagonal). Without 
a taper, all procedures break down: They become overconfident as the dimension increases, and the 
estimated sampling diversity increases with q. 

5.2. Variation of 7. We compute the update for 7 £ {0, 0.05, 0.10, . . . , 1} and take as the measure 
of the sampling diversity the quantity A^ 1 x ESS introduced above. Some results are shown in 
Figure [1] The diversity increases with 7 and decreases with the dimension q as expected. In the 
bimodal case, even the particle filter does not degenerate, and typically small or moderate values of 
7 apparently give sufficient diversity. 
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5.3. The updates of the first two coordinates. We concentrate on the first two coordinates of 
xj' 7 which contain the non-Gaussian features (if present). We show the contours of the true update 
density ([7]) for the ensemble Kalman filter and for the filter with 7 chosen such that the diversity 
t = TV -1 • ESS is approximately 40%. Figure [2] shows the results for bimodal prior with q = 250. 
In case of the Gaussian prior, the two plots (which are not shown here) are virtually identical. In 
the non-Gaussian situation, the combined filter is able to pick up some non-Gaussian features. In 
particular, the shape and not only the location depends on the observation, and the bimodality of 
the posterior is captured. 



6. Examples of filtering with many cycles 



6.1. The Lorenz 96 model. The 40-variable configuration of the Lorenz 96 model (jLorenz and Emanuel 
19981 ) is governed by the ordinary differential equation 



dxl 

dt 



(X, 



X 



k-2- 



X 



k-1 



X & + 8, fc = l,...,40. 



where the boundary conditions are assumed to be cyclic, i.e., X k = X i0+k . The model is chaotic 
and mimics the time-evolution of a scalar meteo r ologic al qu antity on a latitude circ le. We adopt the 
same experimental setup as in 



Bengtsson et al 



(2003) and iFrei and Kiinschl (|201ll ): Measurements 
of odd components X 2k ~ x with uncorrelated additive 7V(0, 0.5) noise at observation times 0.4 x n, 
n = 1, . . . ,2000, are taken. The large lead time produces a strongly nonlinear propagation step. 
The system is integrated using Euler's method with step size 0.001. Both the ensemble Kalman 
filter and ensemble Kalman particle filter are run with N = 400 ensemble members. The true 
initial state and the initial ensemble members are randomly drawn from A/40 (0,7). All sample 
covariance matrices are replaced by tapered estimates; for the sake of simplicity, we used the same 
tap er matrix C throughout, namely the GC taper constructed from the correlation function given 
in (jGaspari and Cohnl . Il999l equation (4.10)) with support half-length c = 10. For the ensemble 
Kalman particle filter, the parameter 7 is chosen adaptively to ensure that the diversity r = TV -1 x 
ES S stays within a prespe cified interval [tq,ti] C [0,1] if possible. We also ran the filter proposed 



Papadakis et al.l (|2010h . For the given ensemble size, we were not able to obtain a non-divergent 



run; the filter collapsed after just a few cycles. 

The filter performance is assessed via a scoring rule evaluated at observation times. Here, we 
use the root mean squ are error of the ensemble mean, and the continuous ranked probability score 
([Gneiting et all 120071 ) for the first two state variables. More precisely, if X k is the true solution at 
time t, and X k the mean of the updated ensemble, and F k (y) the marginal empirical cumulative 
distribution function of the updated ensemble, then the root mean square error of the ensemble 
mean at time t is 



(13) 



RMSE t 



\ 



k=l 



and the continuous ranked probability score for the fcth variable at time t is 



(14) 



crps; 



(F t k (y) 



- 1 



{y>X?} 



dy, fc = l,2 
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Figure 2. First two components of the update of the bimodal prior with q — 250. Upper 
row: j/i, lower row: 7/2 • Left column: 7 chosen to achieve a diversity of about 40%. Right 
column: Ensemble Kalman filter. The prior sample is shown light grey, the observation is 
marked with a cross. The contours show the Gaussian mixture ([7]): Levels are equidistant 
on a log scale such that the lowest level corresponds to 1% of the maximum. 



where t = 0.4 X n, n = 1, . . . , 2000. Notice that for reasons of symmetry, we only consider the CRPS 
of the first two state variables (i.e., one observed and one unobserved variable). 

Tables Q] and [5] compile summaries (first and ninth decile, mean and median) of the 2000 RMSE 
and CRPS values. 

The gain over the ensemble Kalman filter achieved by the ensemble Kalman particle filter is 
substantial. In particular, as the CRPS values show, the ensemble Kalman particle filter is able to 
track the unobserved states much more accurately than the ensemble Kalman filter. Overall, the 
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[r ,ri] 10% 50% mean 90% 

EuKF 0.56 0.81 0.87 1.25 

EnKPF [0.80,0.90] 0.52 0.75 0.83 1.21 

EnKPF [0.50,0.80] 0.51 0.73 0.80 1.18 

EnKPF [0.30,0.60] 0.50 0.71 0.79 1.17 

EnKPF [0.25,0.50] 0.49 0.70 0.78 1.16 

EnKPF [0.10,0.30] 0.49 0.71 0.79 1.17 



Table 1 . Lorenz 96 system, experimental setup as given in Section lfTTl summary statistics 
of rmse (|13p over 2000 cycles using the ensemble Kalman filter (EnKF) and the ensemble 
Kalman particle filter (EnKPF) with constrained diversity r = A r_1 ESS G [to, Tl] for the 
weights. 



X 1 X 1 

[t ,ti] 10% 50% mean 90% 10% 50% mean 90% 

EnKF 0.12 0.22 0.32 0.65 0.14 0.38 0.57 1.18 

EnKPF [0.80, 0.90] 0.11 0.21 0.30 0.62 0.13 0.33 054 1A3 

EnKPF [0.50,0.80] 0.11 0.21 0.29 0.61 0.12 0.32 0.51 1.10 

EnKPF [0.30,0.60] 0.11 0.20 0.29 0.59 0.12 0.32 0.49 1.02 

EnKPF [0.25,0.50] 0.10 0.20 0.28 0.58 0.11 0.31 0.48 1.00 

EnKPF [0.10,0.30] 0.10 0.21 0.29 0.59 0.11 0.31 0.50 1.05 



Table 2. Lorenz 96 system, experimental setup as given in Section lfTTl summary statistics 
of CRPS (|14p over 2000 cycles for the state variables X 1 (observed) and X 2 (unobserved) 
using the ensemble Kalman filter (EnKF) and the ensemble Kalman particle filter (EnKPF) 
with constrained diversity r = 7V _1 ESS G [to, ti] for the weights. 



re sults for the ensemble K alman particle filter are comparable with those reported for the XEnKF 
in 



Frei and Kunschl ([201 11 ) . Arguably, the best performance of the ensemble Kalman particle filter 



is achieved with diversity constrained to [0.25,0.50], but the scores are surprisingly robust. Finally, 
we note that for smaller ensemble sizes, e.g., N — 100, the ensemble Kalman particle filter still 
improves over the ensemble Kalman filter, but the results (which are not shown here) are less 
impressive. Nevertheless, one should keep in mind that with a particle filter, even with judicious 
tu ning, much mo r e part icles are required to compete with the ensemble Kalman filter, as illustrated 



in Bocauet et al 



(|2010h (for a slightly different configuration of the Lorenz model). 



6.2. The Korteweg-de Vr ies equation. We consider the Korteweg-de Vries equation on the circle 
(jDrazin and Johnsonl Il989l ): 

d t x + d 3 s x + 3d s x 2 = 

with domain (s, t) G [—1,1) x [0, oo) and periodic boundary conditions, x(s — — = x(s = 1, t) 



van Leeuwen 



Versio n s of this equation have bee n u sed as test beds for data assim ilation in, e.g., 
(|2003l) . lLawson and Hansen! (|2005l) . or IZupanski and Zupanskil (|2006l) . The spatial domain [—1, 1) 
is discretized using an equispaced grid with q = 128 grid points. The spectral split step method is 
used to solve the equation numerically (with an explicit 4th order Runge-Kutta time step for the 




Figure 3. Korteweg-de Vries equation. The left figure shows the initial f 6- member en- 
semble (grey) at time t = together with the true solution (black). The right figure shows 
the predictive ensemble (grey) at the first observation time t = 0.01 together with the 
observations (black bullets) and the true solution (black). 



nonlinear part of the equation). As initial prior we take the random field 

X(s,t= 0) = cxp , togfa) ~ W(log(0.05),log(0.3)). 

For the truth, we use r/ — 0.2, and the initial ensemble is a (quasi-random) sample from X(s, t = 0). 
The ensemble size is N — 16, and thus N <C q. Six irregularly spaced observations with uncorrelated 
additive A/"(0,0.02) noise at observation times 0.01 X n, n = 1, . . . , 10, are taken. For illustration, 
Figure |3] displays the initial 16-member ensemble and the predictive ensemble at the first observation 
time (together with the observations). 

The particle filter, the ensemble Kalman filter and the ensemble Kalman particle filter are run 
(with no tapering applied). For the ensemble Kalman particle filter, we fix 7 = 0.05, which ensures 
that t = N^ 1 ■ ESS lies roughly in the interval [0.80,0.90]. Since the particle filter degenerates very 
quickly for such a small ensemble, a benchmark run with N = 256 particles is carried out. Figure 2] 
displays ensemble deviations from the true solution after 1 and 10 update cycles. Apparently, both 
the ensemble Kalman filter and ensemble Kalman particle filter track the true solution reasonably 
well, and the state uncertainty is well represented. In terms of error of the ensemble mean, there 
is not much difference between the ensemble Kalman filter and the ensemble Kalman particle filter. 
However, the ensemble Kalman particle filter produces particles that exhibit less dynamical incon- 
sistencies. In Figure El for each filter, the particle with the most curvature after 10 update cycles is 
shown, where the curvature of a solution x(s, t) is defined by \d%x\ (1 + (d s x) 2 )~ 3 / 2 ds, and a finite 
difference approximation is used for the discretized solutions. For reference, we note that the true 
solution (not shown in the plots) is virtually identical to the particle shown in the rightmost plot. 
The particle filter (which conserves dynamical constraints) yields very smooth particles, whereas the 
ensemble Kalman filter may produce wiggly, "unphysical" particles. The ensemble Kalman particle 
filter lies in-between: some particles still show slight dynamical imbalances, but these are much less 
pronounced than for the ensemble Kalman filter. Also, if a taper is applied to the forecast covariance 
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Figure 4. Korteweg-de Vries equation. Ensemble deviations about the truth, i.e., filtered 
ensemble minus true solution, after the fst (t = 0.01, left panel) and 10th (t = 0.1, right 
panel) update cycle, for the ensemble Kalman filter and ensemble Kalman particle filter 
with TV = 16 particles (top two rows), and for the particle filter with N = 256 particles 
(bottom row) . The solid grey lines are the deviations, the dotted grey lines the average of 
the deviations, and the black bullets are the observations minus the truth. 

matrix (which is not the case in the example here), the ensemble Kalman filter suffers even more 
from these imbalances. 

7. Outlook to possible generalizations 



In the spirit of the "progressive correction" idea (jMusso et all 120011) , the ensemble Kalman par 



ticle filter update could also be split up in several steps. We fix constants ji > and Si > with 
J2i=i Ji + di = 1- Then, for i = 1, . . . , L, we apply an ensemble Kalman filter update with likelihood 



BRIDGING THE ENSEMBLE KALMAN AND PARTICLE FILTER 



15 




Figure 5. Korteweg-de Vries equation. Particle with most curvature after 10 update 
cycles (t = 0.1), for the ensemble Kalman filter (left), ensemble Kalman particle filter 
(middle) and particle filter (right). 



£(x|y) 7i , followed by a particle filter update with likelihood £(x\y) Si , followed by the resampling 
step. It is only necessary to estimate the predictive covariance for the first step; for the subsequent 
steps, i = 2, . . . , L, we can compute the covariance analytically from the mixture representation 
([7]) (for large q, this is numerically delicate, but the same remedies as discussed in Section [3721 can 
be applied). We expect that the bias of such an iterative ensemble Kalman particle filter update is 
similar as for a single ensemble Kalman particle filter update with 7 = ji, but the variance will 

decrease with increasing L since the likelihoods become flatter. In the limiting case Yli=i li ~ * 0> 
which corresponds to a full particle filter update, we conjecture that L = 0(q) is sufficient to retain 



the sampling diversity. This claim is supported by Bcs kos et al.l (|2012f) , who analyze the "tempering" 
idea in simpler but related situations. 

A potential drawbac k of the ensemb l e Kal man particle filter (in comparison to non-Gaussian 
ensemble filters akin to iLei and Bickell (|2012l) ) is its restriction to Gaussian linear observations. 



However, the idea of combining an ensemble Kalman filter and a particle filter update could also 
be used for arbitrary observation densities. Let H be a matrix that selects those components of 
the state variable that influence the observation, and assume that we have an approximation of the 
likelihood of the form l{Hx\y) ~ tp(g(y); Hx, R(y)). Then we can use this approximation for an 
ensemble Kalman filter update, and correct by a particle filter update with weights proportional to 
£(Hxj\y)(<p(g(y); Hx™, R(y)))^ 1 . In order to construct an approximation of the likelihood of the 
above form, we can use a Taylor approximation 

logi(Hx\y) « log£(H^\y) + a(y)'H{x -^) + l( x - ^)'H'b{y)H{x - 

where a{y) and b{y) are the gradient and the Hessian, respectively, of the log likelihood. Then 
R(y) = —b(y)^ 1 and g(y) — R(y)a(y). Alternatively, one could center the expansion at the mode 
of likelihood. Such an approximation is expected to work well in cases where the likelihood is 
log-concave, e.g., when y given x is Poisson with parameter exp(ir). 
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Appendix 



Proof of Theore m [H Convergence of it- 



u,N 
EnKPF 



to tt u implies convergence of J2f=i ^ X U,N to 



7r", see Lemma 7 in iFrei and Kiinschl (|201lh . It remains to establish the former convergence. To 
begin with, we introduce some notation used throughout the remainder. For ease of legibility, the 
dependence on N is dropped. An overbar ' is added to any quantity to denote its population 
counterpart, in which P p has been replaced by P p . For a measure ir and a function h, we write 
ixh = J h(x)tr(dx). Expressions of the form A — > B are shorthand for A converges to B almost 
surely as N goes to oo. Straightforward application of the strong law of large numbers shows that 
the population version tTe'iiKPF converges to some nonrandom limit, and it is clear by construction 
that this limit equals ir w . Hence, it remains to prove that the population version of the ensemble 
Kalman particle filter has the same limit as the ensemble Kalman particle filter, i.e., we need to 
prove that 



(15) 



KeiiKPf' 1 ~ '"'EnKPF ^ I 



for any continuous and bounded function h. In addition, we may assume that h is compactly 
supported on, say, Sh C R 9 , since these functions are still convergence determining for the weak 
topology. Write \\h\\oo — max x |ft(a:)|. 



Observe that 



'EnKPF' 



^EnKPF^I — 



N 

E 



a"N(tf,P u )h- 



N 



N 

N 



£c^(/ij, p-)h - ^cqM(jq, P u )h 



3 = 1 



(16) 



In the following, we show that both terms on the right-hand side of (|16|) converge to 0, which proves 

(p~5|> . For later use, we note that P p — » P p , and hence by continuity, 

(17) 

K( 7 P p ) -> K^PP), Q( 7 , Pp) Q( 7 , P p ), A-((l- 7 )Q( 7j P p )) K((l-i)Q( 7 , PP)), P" -> P* 



The first term in (|T6|) can be bounded by \\h\ 
unnormalized weights in ©, i.e., 



A,=i 



Let wj and Wj denote the 



W : 



tp(y;H^,HQ( 7 ,PP)H' 



1 



Observe that 



^<( P (0;0, J ffg(7,P p )F' 



1 



1-7 
i?)<^(0;0 



R). 



1 



-R), 



1 — 7 1 — 7 

where the last inequality follows from det(M + N) > det(M) for arbitrary positive definite M and 
positive semi-definite N. The same bound holds true for W". Notice that 
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and \avew u — aveuT"! < ^ SjLi \ w j ~ wnere avew" = ^ S^Li an d avew" = XwLi ^ ■ 
The Wj are iid, hence, avew" converges almost surely, and we conclude that X)j=i \ a j L — SJ| — > if 



(18) 



1 w 



To show (fT5)l . we fix a compact set Del'. Then we have 



max 

l<j<N:x^eD 



(19) 



1 1 N 

<2^(0;0,- j R)t7E 1 - p ^+ max P 

1-7 A ^— ' J T Kj<N:x?£D 



i 



-+2^(0;0,- fl).PK^], 

1 — 7 



where the "max" -term goes to zero for reasons of uniform continuity in combination with (|17[) . 
Letting D|R ? , establishes (jT5f by virtue of dominated convergence. 

We now analyze the second term in (|16j). Again, we fix a compact set Del'. Then we have 



JV 



< max 

l<j<N:x p GD 



S h 



dz 



N 



<tD 



.7 = 1 



E 



D 



EK] ' 

where the "max" -term goes to zero for reasons of uniform continuity in combination with (|17p . 
Letting DfR 5 , shows that also the second term in ((T6| converges to 0, which completes the proof. 



Proof of Lemma [TJ We set 

Z = exp(-i(^ - yP)'C^ - //) + ^ - M P ))- 

Then by definition 

Var[ar] ' (M.iV 

A 2 VE[Z] 2 / 

The lemma follows by completing the square and using that Gaussian densities integrate to one. 
More precisely, for any x and any positive definite matrix T: 

x\c 1 + r> - 2d' 7 x = ( x - (c 7 + rrXyCGy + r)(x - (c 7 + r) -1 ^) - d 7 (c 7 + rp 1 ^. 

Therefore 

E[Z] = (det(P p ) det(C 7 + (F^^ 1 ))" 172 exp(id 7 (C 7 + (PP)- 1 )^) 

and 

E [Z 2 ] = (det(P p )det(2C 7 + (P p )- 1 )) _1/2 exp(2d;(2C 7 + (PPy 1 )- 1 ^)- 
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Taking these results together, the first claim follows. 
For the second claim, we note that as 7 t 1 



C 7 ~ (1 - 7)fl"(/ - K'^^R)- 1 ^ 
d 7 ~ (1 - 7)ff'(J - K' l H')R)- 1 {I 



HK\)H, 
HKx){y — Hjj?) 



because X 7 and Q 7 are continuous. The result then follows by a straightforward computation. 
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